A Novel Parallel Processing Model for Noise Reduction and Temperature Compensation of MEMS Gyroscope

To eliminate the noise and temperature drift in an Micro-Electro-Mechanical Systems (MEMS) gyroscope’s output signal for improving measurement accuracy, a parallel processing model based on Multi-objective particle swarm optimization based on variational modal decomposition-time-frequency peak filter (MOVMD–TFPF) and Beetle antennae search algorithm- Elman neural network (BAS–Elman NN) is established. Firstly, variational mode decomposition (VMD) is optimized by multi-objective particle swarm optimization (MOPSO); then, the best decomposition parameters [kbest,abest] can be obtained. Secondly, the gyroscope output signals are decomposed by VMD optimized by MOPSO (MOVMD); then, the intrinsic mode functions (IMFs) obtained after decomposition are classified into a noise segment, mixed segment, and drift segment by sample entropy (SE). According to the idea of a parallel model, the noise segment can be discarded directly, the mixed segment is denoised by time-frequency peak filtering (TFPF), and the drift segment is compensated at the same time. In the compensation part, the beetle antennae search algorithm (BAS) is adopted to optimize the network parameters of the Elman neural network (Elman NN). Subsequently, the double-input/single-output temperature compensation model based on the BAS-Elman NN is established to compensate the drift segment, and these processed segments are reconstructed to form the final gyroscope output signal. Experimental results demonstrate the superiority of this parallel processing model; the angle random walk of the compensated gyroscope output is decreased from 0.531076 to 5.22502 × 10−3°/h/√Hz, and its bias stability is decreased from 32.7364°/h to 0.140403°/h, respectively.


Introduction
Thanks to the emergence and progress of Micro-Electro-Mechanical Systems (MEMS) technology, the research and application of MEMS inertial devices have attracted extensive attention [1][2][3][4][5]. As one of the outstanding representatives, MEMS gyroscopes are used in aerospace, consumer electronics and other high-precision control and measurement fields widely because of low power consumption, small size, easy integration, high reliability, and high precision [6,7]. Although the MEMS gyroscope has many superior advantages, its performance is affected by the temperature drift due to its manufacturing process and inherent structural characteristics, so how to effectively eliminate the influence on the gyroscope has become a research hotspot [8][9][10][11][12][13][14][15][16][17][18].
In general, the methods to reduce the influence of temperature drift on gyroscope performance are classified into two categories: hardware temperature compensation and software temperature compensation. The hardware temperature compensation method is used to reduce temperature error from the perspective of device optimization, including improving the manufacturing process and optimizing the sensor structure and circuit control system. Many domestic and foreign experts have proposed different hardware compensation schemes, which have made good progress [9][10][11][12][13][14][15][16]. Cui et al. [9] analyzed the driving modes and sensing modes of the gyroscope, and they designed a model for temperature compensation according to the vibration characteristics of the driven mode using a variable temperature resistor. Fu et al. [10] designed a constant transconductance high linear amplifier that achieves low phase drift and low amplitude drift interface circuits over all temperature ranges. Guo et al. [11] designed a resonant MEMS gyroscope with a self-temperature compensation function to eliminate temperature errors, in which the gyroscope's sensing unit is composed of four double-ended tuning forks with symmetrical distribution. Based on the bipolar temperature compensation method, the sensing mode closed-loop method for the gyroscope is introduced by Cao et al. [12]. However, hardware compensation has a long research and development cycle, high consumption, and is not easy to realize.
Another method is the software compensation; the corresponding relationship between the temperature input and the gyroscope output is obtained by establishing the temperature compensation model. In it, a set of experimental temperature and gyroscope output obtained in advance are trained as input and output to establish the model. The temperature compensation model has two problems to solve: one is the noise in the high-frequency component, and the other is the drift caused by temperature change in the low-frequency component. Due to the different processing methods, the temperature compensation model can be divided into serial processing and parallel processing models. Serial processing is to denoise the entire output signal first and then establish the compensation model to eliminate the temperature drift. The method of first denoising and then compensation will lead to the loss of useful signals in the original signal, leading to unsatisfactory compensation results [17][18][19][20][21][22].
Different from the serial processing model, the parallel processing model is used to extract the noise component and drift component of the signal and then carry out noise reduction and compensation processing respectively at the same time and get the final signal through reconstruction. As two representative adaptive decomposition algorithms, empirical mode decomposition (EMD) and variational mode decomposition (VMD) are used in the engineering fields widely. However, mode mixing affects the decomposition effect of EMD, and VMD has improved the mode mixing by constructing and solving the variational model [23].
Unfortunately, the proper decomposition parameters of VMD should be selected before using. When the decomposition number k is set unreasonably, over-decomposition or under-decomposition will occur. On the other hand, the larger the penalty factor α, the wider the bandwidth of the intrinsic mode function, and vice versa, which affects the decomposition accuracy of VMD [24]. Therefore, it is significant to select the appropriate VMD decomposition parameters [k,a]. Thanks to the emergence of intelligent algorithms such as swarm optimization algorithms and neural networks, many scholars have used these algorithms to optimize the VMD [25][26][27]. These optimization algorithms realize the purpose of optimization by constructing the single objective function, which only considers the problem in one aspect, while the multi-objective optimization algorithm comprehensively considers the optimization of the target from many aspects and can obtain the global optimal characteristics. As one of the multi-objective optimization algorithms, multi-objective particle swarm optimization (MOPSO) [28] has been successfully applied to the engineering fields in view of its simple theory, fast convergence, strong global optimization ability, flexible parameter adjustment mechanism, and other characteristics.
In addition, the establishment of a high-precision compensation model is an important link in the parallel processing model; intelligent algorithms are competent at establishing corresponding relationships between temperature and gyroscope output. For example, BP neural network, extreme learning machine (ELM), support vector machine (SVM), and the improvement of these algorithms have been widely used in temperature compensation and achieved good results [17][18][19][20][21]29]. Among them, Elman neural network (Elman NN) is a dynamic recursive network with feedback; the increase in feedback layer makes the network have time-varying adaptive characteristics, thus increasing the global stability of the network. As an improvement of BP neural network, it also inherits its disadvantages to some extent, which will lead to local optimization [22]. To improve the processing accuracy of Elman NN, the beetle antennae search algorithm (BAS) is adopted to optimize the Elman NN.
Therefore, the parallel processing model for eliminating noise and temperature drift based on MOVMD-TFPF and BAS-Elman NN is put forward. Firstly, MOPSO is adopted to determine the optimal decomposition parameters of VMD; then, the gyroscope output signal is decomposed by MOVMD. Secondly, the intrinsic mode functions (IMFs) obtained after decomposition are classified by sample entropy (SE) into a noise segment, mixed segment, and drift segment. Then, the noise segment is discarded directly, the mixed segment is denoised by TFPF, and the drift segment is compensated by double-input/singleoutput temperature compensation model based on the BAS-Elman NN at the same time. Finally, the final gyroscope output signal can be get through reconstructing the processed segments, details of the algorithm theory, experimental process, and comparative analysis are given in the following sections.

Dual-Mass MEMS Gyroscope's Structure
A dual-mass MEMS gyroscope is adopted for the temperature experiments in this article, and the overall structure is shown in Figure 1 [30]. When the gyroscope works, there are two main modes: the drive mode and sense mode. Between them, the drive comb, drive spring, and other parts constitute the drive mode, which is moving on the X-axis direction; the sense comb, sense spring, and other sections compose the sense mode, which is moving along the Y-axis. In Figure 1, the two sensitive masses (left and right masses) are the parts of both modes, which vibrate along the negative direction of the X-axis. In addition, the two modes of the gyroscope are isolated from each other to avoid the generation of coupling displacement. When the angular velocity Ωz is input around the Z-axis, the Coriolis force generated by the vibrating mass is passed to the frame along the Y-axis and later examined through a monitoring electric circuit.
The gyroscope works on the basis of tuning fork theory. The U-type connecting spring is connected to the dual-drive mass block, and the drive spring is connected to the dualsense mass block. In order to analyze the gyroscope's working modes, Ansys Software (ANSYS, Pittsburgh, PA, USA) is adopted to simulate these working modes; the results are shown in Figure 2 [30]. It can be found that the frequency gap between the first and the fourth working mode is large, which is more than 1000 Hz, and the fourth mode of the gyroscope has a quality factor of more than 2000, so the fourth mode is the gyroscope drive mode. Figure 2a-d show the analysis results of the gyroscope's four working modes in turn. The drive in-phase mode is the first mode, in which the vibration direction of the double mass of the gyroscope is consistent with the X-axis. The second mode is the sensing in-phase mode, in which the two masses simultaneously vibrate consistent with the Y-axis direction. The third mode is the sensing anti-phase mode; in this mode, the two mass blocks oscillate in the opposite direction of the Y-axis. The fourth mode is the drive anti-phase mode, in which the two mass blocks of the gyroscope oscillate in the contrary direction to the Y-axis. In addition, the resonant frequencies of these four modes are 2623 Hz, 3342 Hz, 3468 Hz, and 3484 Hz. The fourth mode is not only the drive anti-phase mode but also the true driving mode. Combined with the above analysis, it can be analyzed that the two Micromachines 2021, 12, 1285 4 of 23 mass blocks of the gyroscope have two degrees of freedom (X-axis and Y-axis), while the drive mode and frame has only a single degree of freedom (X-axis or Y-axis).  The gyroscope works on the basis of tuning fork theory. The U-type connecting spring is connected to the dual-drive mass block, and the drive spring is connected to the dual-sense mass block. In order to analyze the gyroscope's working modes, Ansys Software (ANSYS, Pittsburgh, PA, USA) is adopted to simulate these working modes; the results are shown in Figure 2 [30]. It can be found that the frequency gap between the first and the fourth working mode is large, which is more than 1000 Hz, and the fourth mode of the gyroscope has a quality factor of more than 2000, so the fourth mode is the gyroscope drive mode. Figure 2a-d show the analysis results of the gyroscope's four working modes in turn. The drive in-phase mode is the first mode, in which the vibration direction of the double mass of the gyroscope is consistent with the X-axis. The second mode is the anti-phase mode, in which the two mass blocks of the gyroscope oscillate in the contrary direction to the Y-axis. In addition, the resonant frequencies of these four modes are 2623 Hz, 3342 Hz, 3468 Hz, and 3484 Hz. The fourth mode is not only the drive anti-phase mode but also the true driving mode. Combined with the above analysis, it can be analyzed that the two mass blocks of the gyroscope have two degrees of freedom (X-axis and Y-axis), while the drive mode and frame has only a single degree of freedom (X-axis or Yaxis).

Gyroscope's Periphery Circuit
In Figure 3 [31], because the drive circuit is controlled by an AGC closed-loop, the drive mode can maintain constant amplitude motion at the resonant frequency. Then, the sense circuit detects the displacement of the sense mode and completes signal processing.
For the drive loop, the drive sense comb is used to measure the shift of the drive frame x(t), and the x(t) is then converted into a voltage signal Vsdr by a linear transformation of the X/V converter, which is amplified into Vsd. To satisfy the phase required for the AC drive signal VdAC = VdACASin(wdt) as much as possible, the phase of Vsd should be delayed by 90°. After that, the signal VdACA is extracted through a full-wave rectifier and low-pass filter, and the extracted signal is contrasted to the reference voltage Vf at the same Figure 2. The four working modes of gyroscope. (a-d) are the four modes of the gyroscope respectively, and their corresponding frequencies are w 1 = 2623 × 2π rad/s, w 2 =3342 × 2π rad/s; w 3 = 3468 × 2π rad/s; w 4 =3484 × 2π rad/s.

Gyroscope's Periphery Circuit
In Figure 3 [31], because the drive circuit is controlled by an AGC closed-loop, the drive mode can maintain constant amplitude motion at the resonant frequency. Then, the sense circuit detects the displacement of the sense mode and completes signal processing.
For the drive loop, the drive sense comb is used to measure the shift of the drive frame x(t), and the x(t) is then converted into a voltage signal V sdr by a linear transformation of the X/V converter, which is amplified into V sd . To satisfy the phase required for the AC drive signal V dAC = V dACA Sin(w d t) as much as possible, the phase of V sd should be delayed by 90 • . After that, the signal V dACA is extracted through a full-wave rectifier and low-pass filter, and the extracted signal is contrasted to the reference voltage V f at the same time. Then, the control signal V dI can be obtained when the output signal of the comparator passes through the integrator controller, and V dI is used to modulate V dAC to get the drive AC signal V AC. Finally, the DC signal V DC is driven to superimpose with V AC to form a force that can create an excitation for the drive mode. time. Then, the control signal VdI can be obtained when the output signal of the comparator passes through the integrator controller, and VdI is used to modulate VdAC to get the drive AC signal VAC. Finally, the DC signal VDC is driven to superimpose with VAC to form a force that can create an excitation for the drive mode. For the sense circuit, it is open-loop and uses the same interface as the drive circuit. The motion signals of the left and right sensitive mass blocks are captured by the differential detection amplifier, and then, the second differential amplifier is adopted to process the output signal to generate sense motion signal Vs. Then, Vs is demodulated by VdAC and then denoised by the low-pass filter to obtain the final sensitive motion signal Vso; for the signal Vso, the compensation module "B" or compensation algorithms can be added to compensate it and finally get the compensation signal Vo.

Variational Mode Decomposition (VMD)
The VMD is an effective decomposition method for processing nonstationary signals. Different from EMD, which decomposes complex signals by recursion-filter decomposition, VMD decomposes complex signals by non-recursive decomposition and constructing a variational model. The optimal solution of the variational model is searched through cyclic iterative processing, which means that the complex signals are decomposed into many IMFs, and each IMF has the center frequency and limited bandwidth. This enables VMD to avoid the mode aliasing phenomenon existing in EMD and has better noise robustness. The decomposition principle of VMD is briefly described as follows [32].
(1) The construction of constrained variational model.
with a center frequency and finite bandwidth; the variational model is constructed to seek the optimal mode functions so as to minimize the sum of estimated bandwidths of all intrinsic mode functions. The variational model is constructed as follows: a. Hilbert transformation is performed on the obtained mode functions to obtain their analytic signals; the purpose is to get the unilateral spectrum of each mode function.
To obtain the constrained variational model, the center frequency of each modal analytical signal obtained in Equation (1) is initialized; then, the square norm of the demodulation signal gradient is calculated, and the bandwidth of each IMF is estimated: For the sense circuit, it is open-loop and uses the same interface as the drive circuit. The motion signals of the left and right sensitive mass blocks are captured by the differential detection amplifier, and then, the second differential amplifier is adopted to process the output signal to generate sense motion signal V s . Then, V s is demodulated by V dAC and then denoised by the low-pass filter to obtain the final sensitive motion signal V so ; for the signal V so , the compensation module "B" or compensation algorithms can be added to compensate it and finally get the compensation signal V o .

Variational Mode Decomposition (VMD)
The VMD is an effective decomposition method for processing nonstationary signals. Different from EMD, which decomposes complex signals by recursion-filter decomposition, VMD decomposes complex signals by non-recursive decomposition and constructing a variational model. The optimal solution of the variational model is searched through cyclic iterative processing, which means that the complex signals are decomposed into many IMFs, and each IMF has the center frequency and limited bandwidth. This enables VMD to avoid the mode aliasing phenomenon existing in EMD and has better noise robustness. The decomposition principle of VMD is briefly described as follows [32].
(1) The construction of constrained variational model.
Suppose that any complex signal y(t) is decomposed into k IMFs {u k (t)} = {u 1 (t), u 2 (t), u 3 (t), . . . , u k (t)} with a center frequency and finite bandwidth; the variational model is constructed to seek the optimal mode functions so as to minimize the sum of estimated bandwidths of all intrinsic mode functions. The variational model is constructed as follows: a. Hilbert transformation is performed on the obtained mode functions to obtain their analytic signals; the purpose is to get the unilateral spectrum of each mode function.
b. To obtain the constrained variational model, the center frequency of each modal analytical signal obtained in Equation (1) is initialized; then, the square norm of the demodulation signal gradient is calculated, and the bandwidth of each IMF is estimated: where {θ k } = {θ 1 , θ 2 , . . . , θ k } is the collection of central frequencies of each IMF.
(2) The solution of the constrained variational model. a. To simplify the constrained variational model, the unconstrained variational model is constructed by constructing an extended Lagrangian expression. In Equation (3), a and λ are the penalty factor and Lagrangian multiplication operator.
b. The corresponding extremum solution can be obtained by transforming the Lagrangian function obtained by Equation (3) in the time-frequency domain. The expressions for u k and θ k are as follows, respectively: c. The alternating direction multiplier algorithm is adopted to update the parameters u k n+1 , θ k n+1 , and λ n+1 , and the updated formula of λ n+1 is: In Equation (6), τ is the time constant factor, which affects the update of λ. If the accuracy is not strictly required, the update can be avoided. In this case, τ = 0. d. When the condition of Equation (7) is satisfied, the iteration stops, and k intrinsic mode functions are output. Otherwise, the iteration continues by following the formulas above.

Multi-Objective Particle Swarm Optimization
The MOPSO algorithm is a widely used intelligent algorithm, which combines the particle swarm optimization (PSO) and the grid algorithm; it advances the original single target optimization to multiple targets. It is based on the predation behavior of birds, and it has excellent convergence speed and good overall search ability. The reasonable selection of multiple fitness functions is also the key to MOPSO; fuzzy entropy (FE) and permutation entropy (PE) are selected as fitness functions of MOPSO to optimize the VMD in this article.
Fuzzy entropy (FE) is an algorithm that can reflect the complex components of the measured nonlinear signal; the more sparse the signal is, the greater the fuzzy entropy. At the same time, FE is also an improvement on the approximate entropy; it is less dependent on time series and more robust to noise-containing signals, and the brief introduction to fuzzy entropy (FE) is as follows [33]: Step 1. Reconstruct phase space. For the time series {s(p), 1 ≤ p ≤ N}, phase space reconstruction is carried out to obtain m-dimensional vectors: Here, X p m is m consecutive values of s starting at the pth point and subtracting the mean s 0 (p): Step 2. Define the distance between vectors. D m pq is the maximum difference between vector X m p and X m q , namely: Step 3. Compute the membership degree between vectors. The membership degree of vector X m p and X m q is defined as µ(d m pq ,θ,ω), which is: In the formula, the fuzzy function is defined as µ(d m pq ,θ,ω), which is an exponential function, and the gradient and width of its boundary are denoted as θ and ω.
Step 4. Define the function.
When N is a finite value, Equation (14) is simplified as follows: Another fitness function permutation entropy (PE) is introduced as follows. PE is first proposed by Bandt et al. [34], which can be used to calculate the complexity and randomness of complex signals. PE is used to measure the noise level of the signal in this paper, and the principle of PE is as follows: Step 1. Reconstruct phase space. For the time series {u(j), 1 ≤ j ≤ N}, phase space reconstruction is carried out to get a phase sequence: . . .
Here, e is the embedded dimension, k + (e − 1)ω = n, R(l) represents the reconstructed vector, there are a total of k reconstruction vectors, and the delay time is denoted as ω.
Step 2. Rearrange the reconstructed vectors. Each reconstructed vector is rearranged according to the size; then, the column indexes of elements in the vector are obtained to form a set of symbol sequences {h 1 ,h 2 ,h 3 ...,h m }, namely: When h p < h q , that is: Step 3. The calculation and normalization. After the rearrangement, calculate the probability of each symbol sequence and denote them as P 1 , P 2 ..., P r , and the calculation formula of permutation entropy is: The maximum of permutation entropy is ln e!; normalize the permutation entropy, that is: The normalized permutation entropy can be used to calculate the complexity and randomness of complex signals: that is, the larger the permutation entropy is, the higher the complexity and randomness of complex signals will be, and vice versa.
The brief description of the steps of the MOPSO algorithm is as follows [24]. A. Firstly, the key parameters of MOPSO are set, including the total particle number N P , maximum iteration number M, save set size N R , etc. The number of particles affects the searching ability of MOPSO. When the number of particles is set too large, the algorithm has a good global searching ability, but it will affect the speed of the algorithm. B. Initialize the particle swarm P 1 : The position P(j) of each particle is randomly initialized, while its velocity v(j) is set to zero. The fuzzy entropy and permutation entropy are adopted as fitness functions to evaluate each particle. When the fitness values are smaller, the corresponding VMD decomposition parameters are better. Meanwhile, the noninferior solution in P 1 is stored in the save set N P .
C. Update the individual best particle P best and the global best particle G best , use the adaptive grid method to find the global optimal particle G best , and continuously evolve to generate the next generation particle population; perform the following steps before the save set reaches the maximum: (1). Calculate the density information of the particles in the save set, divide the target space into small areas by the grid, and measure the density by the number of particles in each area.
(2). The historical optimal position is updated when the particle's current position is better than the best position of the individual history. Then, the global optimal particle Gbest is selected for the particles in the population, and the selection is based on the density information of the particles. Specifically, for a particle in the save set, the lower the density value, the greater the probability of selection.
(3). Update the position and velocity of each particle. In addition, the particles search for the optimal solution under the leadership of G best and P best : where d represents the algebra of the current particle evolution, i represents the current evolutionary particle, c 1 and c 2 are the learning factors, µ is the contraction factor, R 1 and R 2 are the random numbers in the interval [0, 1], and P j i,d and G j i,d represent the value of the j-th decision vector of P best and G best of the particle, respectively. The save set is updated; after the evolution of the new generation group P d+1 , the non-inferior solutions in P d+1 are saved to the save set.
D. If the number of particles in the save set exceeds the set maximum value, the individuals in the dense range are replaced, and the individuals in the sparse range are retained to maintain the size of the save set. For a grid with more than one particle, calculate the number of particles ND to be deleted in the grid according to Formula (23), and then randomly delete the ND particles in the grid.
where A t is the quantity of particles in the save set, and Gird[k] is the quantity of particles in grid k. E. When the stop condition is reached, the iteration is stopped, the particle information in the storage set is output, and the optimal decomposition parameters [k best ,a best ] of VMD can be obtained.

Time-Frequency Peak Filtering (TFPF)
TFPF is a noise elimination technology introduced by Mesbah et al. [35]. Thanks to its ability to extract effective signals in a noisy environment, it has been applied widely in many engineering fields.
The TFPF algorithm is mainly based on Wigner-Ville distribution (WVD) and instantaneous frequency estimation theory to filter and de-noise signals. Due to its good time-frequency focusing property, WVD is widely used in engineering. However, when WVD processes multi-component signals, the resolution of time-frequency distribution of signals will be reduced due to the generation of cross terms, which leads to the weakening of time-frequency focusing of WVD. To suppress the cross terms in TFPF, the pseudo-Wiener-Ville distribution is adopted. According to the principle of TFPF, it is necessary to encode the noisy signal to make it become the analytic signal of instantaneous frequency firstly, and the estimated value of the effective signal can be obtained through estimating its instantaneous frequency.
The gyroscope output signal is mixed with noise: where x(t) and n(t) represent the useful signal and noise in the gyroscope output respectively, and the steps of TFPF are as follows [36]: Step 1. Through frequency modulation of signal y(t) containing noise, the analytic signal z(t) can be obtained: Here, µ is the frequency modulation index.
Step 2. The pseudo-Wigner-Ville distribution spectrum of the analytic signal z(t) is calculated: where t stands for time, τ stands for integral variable, f stands for frequency, z * stands for the conjugated operator of z, the window function is denoted as h(τ), and the window length is a tradeoff parameter of TFPF.
Step 3. According to the maximum likelihood estimation principle, the peak value of the PWVD distribution spectrum of the analytic signal is calculated as the instantaneous frequency estimation of the analytic signal, and the amplitude estimation of the original effective signal can be obtained: In general, the temperature compensation model is established to study the relationship between temperature and temperature drift in the gyroscope output. Unfortunately, the general temperature compensation models only consider the temperature itself, which leads to the unsatisfactory accuracy of the compensation model. To improve the model accuracy, the temperature change rate is added in this paper to describe the temperature field. Assume that the observed value of the gyroscope temperature field is M: where T represents the temperature, and ∆T describes the rate of temperature change.
After obtaining the temperature field and combining with the temperature drift in the gyroscope, the mapping model is established: where S is temperature drift and R(.) is the prediction function, which is a multi-input/singleoutput model established by neural network. The model framework is shown in Figure 4, and the Elman neural network optimized by the beetle antennae search algorithm is used as the learner.

Beetle Antennae Search Algorithm (BAS)
BAS is an efficient intelligent optimization algorithm that is inspired by the principle of beetles foraging. It does not need to know the specific form and gradient information of the function, and it only needs one individual in the target search. Due to its simple and flexible drifts, avoiding local optimal solutions and being suitable for multi-latitude search, BAS has shown wide applicability and high efficiency in solving complex optimization problems [37][38][39]. The basic principle of the BAS algorithm is that when the beetle is looking for food, it will randomly explore the information intensity in the nearby space by swinging the long antennae on the sides of its body. When one antenna detects a higher information intensity than the other, the beetle will move to the side with the higher information intensity, and it will continue to explore the information intensity at random until it finds food. The beetle foraging diagram and simplified beetle model are shown in Figure 5a,b, respectively [40].

Beetle Antennae Search Algorithm (BAS)
BAS is an efficient intelligent optimization algorithm that is inspired by the principle of beetles foraging. It does not need to know the specific form and gradient information of the function, and it only needs one individual in the target search. Due to its simple and flexible drifts, avoiding local optimal solutions and being suitable for multi-latitude search, BAS has shown wide applicability and high efficiency in solving complex optimization problems [37][38][39]. The basic principle of the BAS algorithm is that when the beetle is looking for food, it will randomly explore the information intensity in the nearby space by swinging the long antennae on the sides of its body. When one antenna detects a higher information intensity than the other, the beetle will move to the side with the higher information intensity, and it will continue to explore the information intensity at random until it finds food. The beetle foraging diagram and simplified beetle model are shown in Figure 5a,b, respectively [40]. zation problems [37][38][39]. The basic principle of the BAS algorithm is that when the beetle is looking for food, it will randomly explore the information intensity in the nearby space by swinging the long antennae on the sides of its body. When one antenna detects a higher information intensity than the other, the beetle will move to the side with the higher information intensity, and it will continue to explore the information intensity at random until it finds food. The beetle foraging diagram and simplified beetle model are shown in Figure 5a,b, respectively [40]. As shown in Figure 5b, the BAS algorithm simplifies the individual beetle and regards it as a particle that can sense the left and right direction. Since the direction of the beetle head is random, the right antennae of the beetle will generate a direction vector pointing to the left antennae to indicate the movement direction of the beetle. The steps of BAS algorithm are as follows [37,38]: 1. Create a random direction vector that represents the search behavior of beetle antennae: ( ,1) ( ,1) rands n S rands n = (30) Rands(.) is the random function, and n is the dimension of the search space. 2. Create the spatial coordinates of the antennae: As shown in Figure 5b, the BAS algorithm simplifies the individual beetle and regards it as a particle that can sense the left and right direction. Since the direction of the beetle head is random, the right antennae of the beetle will generate a direction vector pointing to the left antennae to indicate the movement direction of the beetle. The steps of BAS algorithm are as follows [37,38]: 1. Create a random direction vector that represents the search behavior of beetle antennae: → S = rands(n, 1) rands(n, 1) Rands(.) is the random function, and n is the dimension of the search space. 2. Create the spatial coordinates of the antennae: In Formula (31), x t is the beetle's spatial position at the t-th search, x lt and x rt respectively represent the spatial positions of the beetle's left and right antenna at the t-th search, and d t is the sensing length of the antenna: the larger the sensing length, the stronger the searching ability of the beetle. Initially, the sensing length of the antennae is long enough to cover a suitable search area to jump out of the local minimum; then, it gradually decays over time. The information intensities detected by the left and right antenna are judged according to the fitness function, and the position of the beetle is updated: where x t+1 is the space position of the beetle at t+1-th search, λ t is the step size of the beetle's movement during the t-th search, sign(.) is a symbolic function, and f (.) is the fitness function. In addition, the updating rules of d t and λ t can be referred to as follows [31]:

Elman Neural Network (Elman NN)
The Elman NN is a typical dynamic recursive network proposed by Elman; compared with the three-layer structure of the BP neural network, the input layer of the Elman neural network contains context nodes, whose function is to remember the previous activation of hidden layer nodes, which makes the network have the adaptability of time-varying characteristics and thus increases the global stability of the network. The structure of the Elman NN is given in Figure 6 [22].
function. In addition, the updating rules of d and λ can be referred to as follows [31]: The Elman NN is a typical dynamic recursive network proposed by Elman; compared with the three-layer structure of the BP neural network, the input layer of the Elman neural network contains context nodes, whose function is to remember the previous activation of hidden layer nodes, which makes the network have the adaptability of timevarying characteristics and thus increases the global stability of the network. The structure of the Elman NN is given in Figure 6 [22]. Referring to the network structure of the Elman NN in Figure 6, the relationship between input and output is given [22]: Referring to the network structure of the Elman NN in Figure 6, the relationship between input and output is given [22]: where X(k) and Y(k) are the input layer and output layer vectors, respectively, W L1 , . . . , W L5 are the connection weights, a i , a j , and a k are the thresholds of each layer, h(·), g(·), and f (·) are the activation functions, and the activation functions h(x) and g(x) adopt a sigmoid function:

Elman Neural Network Based on Beetle Antennae Search Algorithm
Similar to the BP neural network, the initial weights and thresholds of the Elman NN are generated randomly, which may lead to problems such as local optimum and slow speed in the search process. This paper adopts the BAS algorithm to optimize the weights and thresholds of the Elman NN, and the steps of the BAS-Elman NN algorithm are as follows: 1. According to the structure of the Elman NN, the dimension of the BAS search space is determined.
2. Create the vector of the beetle antennae's orientation and the position of the center of mass.
3. Determine the fitness function for evaluation, and take the mean square error (MSE) of the expected and predicted output of training data as the fitness function: (y p,i − y e,i ) 2 (36) where N is the number of samples used for training, and y p,i and y e,i are the predicted value and expected value of the ith sample, respectively. When the iteration of the BAS algorithm terminates, the solution space vector with the smallest fitness value is the optimal solution. 4. The spatial positions of the left and right antennae are calculated, and the fitness function values of the left and right antennae are compared to update the position of the beetle.
5. Determine whether the stop condition is met. If not, return to Equation (4) and continue iterating.

Parallel Processing Model Based on MOVMD-TFPF and BAS-Elman NN
After introducing the above algorithms, the parallel processing model for eliminating noise and temperature drift based on MOVMD-TFPF and BAS-Elman NN is put forward. Figure 7 shows the processing process of the parallel processing model, and the main steps are as follows.
Step 1. The temperature change and the corresponding gyroscope output are obtained after the temperature experiment. Firstly, multi-objective particle swarm optimization is adopted to determine the optimal decomposition parameters [k best ,a best ] of the VMD, and the multi-objective optimized VMD (MOVMD) is obtained. Then, the output signal is decomposed by MOVMD, and some intrinsic mode functions (IMFs) can be obtained.
Step 2. Secondly, the sample entropy is adopted to classify the obtained IMFs into a noise segment, mixed segment, and drift segment.
Step 3. For the noise segment, it belongs to high-frequency noise and does not contain temperature trend and useful signals, so it can be directly discarded. For the mixed segment, it contains noise and useful components, so TFPF is used for denoising to obtain useful components. Then, BAS-Elman NN is trained by using the temperature and temperature change rate of the training set as the input and temperature drift as the output, and it is applied to predict the drift component in the drift segment to achieve the purpose of compensation.
Step 4. Finally, the denoised mixed segment and compensated drift segment are reconstructed to form the final gyroscope output. Step 3. For the noise segment, it belongs to high-frequency noise and does not contain temperature trend and useful signals, so it can be directly discarded. For the mixed segment, it contains noise and useful components, so TFPF is used for denoising to obtain useful components. Then, BAS-Elman NN is trained by using the temperature and temperature change rate of the training set as the input and temperature drift as the output, and it is applied to predict the drift component in the drift segment to achieve the purpose of compensation.
Step 4. Finally, the denoised mixed segment and compensated drift segment are reconstructed to form the final gyroscope output.

The Experimental Process
To certify the superiority of the parallel processing algorithm proposed in this paper, a gyroscope temperature output experiment is carried out. Figure 8 shows the experi-

The Experimental Process
To certify the superiority of the parallel processing algorithm proposed in this paper, a gyroscope temperature output experiment is carried out. Figure 8 shows the experimental equipment, and Figure 9 shows the structure package and circuit of the gyroscope. A detection circuit is designed in three PCB boards, respectively; the metal pins not only connect the electronic signals of the detection circuit but also can be used to connect three PCBs, which are wrapped with a rubber pad. In addition, after the PCB is wrapped with a rubber pad, it is placed in the metal shell; such doing can reasonably and effectively guard the structure of the chip, and it also can avoid severe impact on PCB. To avoid the interference of the electromagnetic field, the ground signal and the metal shell are connected. In addition, one of the above three PCBs is connected to the structural chip as a weak signal interface, and the other two PCB circuits play the role of induction circuit and drive circuit, respectively.
The experimental equipment is fitted with a temperature-controlled oven (Agilent 34401 A, Agilent, Santa Clara, CA, USA), a multimeter, a signal generator (Agilent 33220 A), and a DC power supply of ± 10 V (Agilent E3631 A).
The experimental procedure is as follows. First, we turned on the gyroscope and left it at room temperature for one hour. Later, we quickly heated the temperature-controlled oven to 60 • C to achieve the purpose of making the temperature in the gyroscope housing consistent with the controlled temperature (60 • C), and we maintained it in this state for one hour. When the temperature control oven dropped 10 • C, the gyroscope remained working in this state for one hour, and when the temperature decreased to -40 • C, the gyroscope continued working for the last hour, and then the experiment ended. It should be noted that two groups of experimental data were collected in this paper to train and test the model, and the subsequent algorithm demonstration and comparative analysis are based on the test set. The temperature controlling system precision of the oven is 0.1 • C, and this work employed a temperature sensor to detect the temperature information inside the gyroscope and make sure the gyroscope output signal and temperature value are collected synchronously. are based on the test set. The temperature controlling system precision of the oven is 0.1 °C, and this work employed a temperature sensor to detect the temperature information inside the gyroscope and make sure the gyroscope output signal and temperature value are collected synchronously.

The Experimental Results
The experiment results are given in Figure 10; in the experiment, the temperature varies from 60 to -40 °C with time, and the gyroscope output contains rich noise; meanwhile, the output drifts with the temperature, and the influence of drift makes the output of the gyroscope change from 0.115° to 0.15°. Since the output of the gyroscope is accompanied by drift and noise, a parallel processing model is needed to eliminate these influences.

The Experimental Results
The experiment results are given in Figure 10; in the experiment, the temperature varies from 60 to -40 • C with time, and the gyroscope output contains rich noise; meanwhile, the output drifts with the temperature, and the influence of drift makes the output of the gyroscope change from 0.115 • to 0.15 • . Since the output of the gyroscope is accompanied by drift and noise, a parallel processing model is needed to eliminate these influences. According to the algorithm steps, the experimental signals need to be decomposed by VMD firstly, and the optimal parameters [kbest,abest] need to be determined by MOPSO before decomposition. The parameters for MOPSO are set as follows: Np and NR are set at 40, the maximum iteration number M is set at 10, the inertia weight W is set at 0.4, while the learning factors c1, c2 are both set at 1.5. The optimization range of parameter k and a are set as [4,12] and [1000,10000], respectively. The convergence evolution of particles is shown in Figure 11, where the Pareto front optimal solution set is marked in red, and the optimal decomposition parameter [kbest, abest] = [8,9755]. According to the algorithm steps, the experimental signals need to be decomposed by VMD firstly, and the optimal parameters [k best ,a best ] need to be determined by MOPSO before decomposition. The parameters for MOPSO are set as follows: Np and N R are set at 40, the maximum iteration number M is set at 10, the inertia weight W is set at 0.4, while the learning factors c 1 , c 2 are both set at 1.5. The optimization range of parameter k and a are set as [4,12] and [1000, 10000], respectively. The convergence evolution of particles is shown in Figure 11, where the Pareto front optimal solution set is marked in red, and the optimal decomposition parameter [k best , a best ] = [8,9755]. by VMD firstly, and the optimal parameters [kbest,abest] need to be determined by MOPSO before decomposition. The parameters for MOPSO are set as follows: Np and NR are set at 40, the maximum iteration number M is set at 10, the inertia weight W is set at 0.4, while the learning factors c1, c2 are both set at 1.5. The optimization range of parameter k and a are set as [4,12] and [1000,10000], respectively. The convergence evolution of particles is shown in Figure 11, where the Pareto front optimal solution set is marked in red, and the optimal decomposition parameter [kbest, abest] = [8,9755]. After decomposition, the eight-layer IMFs are obtained, and it is shown in Figure 12 that there is an obvious trend of temperature drift in IMF1, and it is difficult to distinguish whether IMF2-IMF8 belong to noise or the mixture of drift and noise. So, the sample entropy [41] is adopted to distinguish these IMFs. Figure 13a,b respectively show the sample entropy of each IMF and the classification results, and the IMFs are divided into a drift segment, mixed segment, and noise segment.
Subsequently, the parallel processing model is implemented. For the noisy segment, it is considered as a useless signal and can be discarded directly. For the mixed segment, it is a mixture of noise and drift caused by temperature changes. Therefore, TFPF is adopted here to denoise the mixed segment to preserve useful components. For the drift segment, it is mainly the drift caused by the temperature change, showing a nonlinear change trend. In this paper, the temperature compensation model based on the BAS-Elman NN is used to deal with it. Figure 14 shows the comparison of mixed segments before and after denoising by TFPF; then, the denoised gyroscope output signal can be obtained by reconstructing the denoised mixed segment and the uncompensated drift segment. In Figure 15, the noise in the gyroscope output signal is basically eliminated.
Subsequently, the parallel processing model is implemented. For the noisy segment, it is considered as a useless signal and can be discarded directly. For the mixed segment, it is a mixture of noise and drift caused by temperature changes. Therefore, TFPF is adopted here to denoise the mixed segment to preserve useful components. For the drift segment, it is mainly the drift caused by the temperature change, showing a nonlinear change trend. In this paper, the temperature compensation model based on the BAS-Elman NN is used to deal with it. Figure 12. The decomposition of gyroscope output by MOVMD. Figure 12. The decomposition of gyroscope output by MOVMD.  Figure 14 shows the comparison of mixed segments before and after denoising by TFPF; then, the denoised gyroscope output signal can be obtained by reconstructing the denoised mixed segment and the uncompensated drift segment. In Figure 15, the noise in the gyroscope output signal is basically eliminated.   Figure 14 shows the comparison of mixed segments before and after denoising by TFPF; then, the denoised gyroscope output signal can be obtained by reconstructing the denoised mixed segment and the uncompensated drift segment. In Figure 15, the noise in the gyroscope output signal is basically eliminated. Figure 14. The comparison of mixed segments before and after denoising by TFPF. Figure 14. The comparison of mixed segments before and after denoising by TFPF. Figure 14. The comparison of mixed segments before and after denoising by TFPF. Figure 15. The comparison of gyroscope output signals before and after denoising. Figure 15. The comparison of gyroscope output signals before and after denoising.
Before using the Elman NN for temperature compensation, the beetle antenna search algorithm is adopted to optimize the network weights of Elman NN to improve its learning accuracy. Figure 16a,b are the comparison of predicted output of Elman NN before and after optimization and real output respectively; it can be clearly seen that the prediction accuracy of the optimized Elman NN has been significantly improved. Figure 17a is the real output of the drift segment and the predicted output based on temperature and temperature change rate; Figure 17b is the compensation result of the drift segment.
Then, the denoised mixed segment and the compensated drift segment are reconstructed to form the final gyroscope output; the noise and temperature drift in the gyroscope output signal have been well eliminated in Figure 18. Before using the Elman NN for temperature compensation, the beetle antenna search algorithm is adopted to optimize the network weights of Elman NN to improve its learning accuracy. Figure 16a,b are the comparison of predicted output of Elman NN before and after optimization and real output respectively; it can be clearly seen that the prediction accuracy of the optimized Elman NN has been significantly improved. Figure 17a is the real output of the drift segment and the predicted output based on temperature and temperature change rate; Figure 17b is the compensation result of the drift segment.
Then, the denoised mixed segment and the compensated drift segment are reconstructed to form the final gyroscope output; the noise and temperature drift in the gyroscope output signal have been well eliminated in Figure 18.    In order to further highlight the superiority of the proposed compensation scheme, we compare it with the wavelet threshold denoising (WTD), the BP neural network, and the advanced temperature compensation algorithm that combines the empirical mode decomposition (EMD), wavelet threshold denoising (WTD), and simulated annealing optimized BP neural network. As an error analysis method in IEEE standard [42], Allan variance analysis is widely used in the stochastic error modeling of inertial devices. In this paper, the angle random walk and bias stability of the original gyroscope output and the compensated gyroscope output are quantitatively analyzed by Allan variance analysis. Comparison results are given in Figure 19; the accuracy of the final signal obtained by direct denoising or compensating the gyroscope output signal is very low. Even compared with the advanced EMD + WTD + SA-BP algorithm, the proposed scheme is still superior. After denoising and compensation of the proposed scheme, the angle random walk of the original gyroscope output decreases from 0.531076 to 5.22502 × 10 -3°/ h/√Hz, and the bias stability is reduced from 32.7364°/h to 0.140403°/h; this proves the superior performance of the proposed parallel processing model. In order to further highlight the superiority of the proposed compensation scheme, we compare it with the wavelet threshold denoising (WTD), the BP neural network, and the advanced temperature compensation algorithm that combines the empirical mode decomposition (EMD), wavelet threshold denoising (WTD), and simulated annealing optimized BP neural network. As an error analysis method in IEEE standard [42], Allan variance analysis is widely used in the stochastic error modeling of inertial devices. In this paper, the angle random walk and bias stability of the original gyroscope output and the compensated gyroscope output are quantitatively analyzed by Allan variance analysis. Comparison results are given in Figure 19; the accuracy of the final signal obtained by direct denoising or compensating the gyroscope output signal is very low. Even compared with the advanced EMD + WTD + SA-BP algorithm, the proposed scheme is still superior. After denoising and compensation of the proposed scheme, the angle random walk of the original gyroscope output decreases from 0.531076 to 5.22502 × 10 −3• /h/ √ Hz, and the bias stability is reduced from 32.7364 • /h to 0.140403 • /h; this proves the superior performance of the proposed parallel processing model.  Figure 19. Allan variance analysis of the output signals before and after processing.

Conclusions
In this paper, a parallel processing model based on MOVMD-TFFPF and BAS-Elman NN is established to eliminate the noise and temperature drift in the MEMS gyroscope's output signal. Firstly, the multi-objective particle swarm optimization is adopted to determine the optimal decomposition parameters of the VMD, and the multi-objective optimized VMD (MOVMD) is obtained. After MOVMD decomposition and SE classification, the gyroscope output signal is divided into a noise segment, mixed segment, and drift segment; then, the noise segment is discarded directly, and the mixed segment is denoised by TFPF. Subsequently, the double-input/single-output temperature compensation model based on the BAS-Elman NN is established to compensate the drift segment. Finally, the denoised mixed segment and compensated drift segment are reconstructed to form the final gyroscope output, and the experimental results and comparative analysis verify the validity of the parallel processing model.

Conclusions
In this paper, a parallel processing model based on MOVMD-TFFPF and BAS-Elman NN is established to eliminate the noise and temperature drift in the MEMS gyroscope's output signal. Firstly, the multi-objective particle swarm optimization is adopted to determine the optimal decomposition parameters of the VMD, and the multi-objective optimized VMD (MOVMD) is obtained. After MOVMD decomposition and SE classification, the gyroscope output signal is divided into a noise segment, mixed segment, and drift segment; then, the noise segment is discarded directly, and the mixed segment is denoised by TFPF. Subsequently, the double-input/single-output temperature compensation model based on the BAS-Elman NN is established to compensate the drift segment. Finally, the denoised mixed segment and compensated drift segment are reconstructed to form the final gyroscope output, and the experimental results and comparative analysis verify the validity of the parallel processing model.